24 research outputs found
A Study of Entanglement in a Categorical Framework of Natural Language
In both quantum mechanics and corpus linguistics based on vector spaces, the
notion of entanglement provides a means for the various subsystems to
communicate with each other. In this paper we examine a number of
implementations of the categorical framework of Coecke, Sadrzadeh and Clark
(2010) for natural language, from an entanglement perspective. Specifically,
our goal is to better understand in what way the level of entanglement of the
relational tensors (or the lack of it) affects the compositional structures in
practical situations. Our findings reveal that a number of proposals for verb
construction lead to almost separable tensors, a fact that considerably
simplifies the interactions between the words. We examine the ramifications of
this fact, and we show that the use of Frobenius algebras mitigates the
potential problems to a great extent. Finally, we briefly examine a machine
learning method that creates verb tensors exhibiting a sufficient level of
entanglement.Comment: In Proceedings QPL 2014, arXiv:1412.810
Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning
Deep compositional models of meaning acting on distributional representations
of words in order to produce vectors of larger text constituents are evolving
to a popular area of NLP research. We detail a compositional distributional
framework based on a rich form of word embeddings that aims at facilitating the
interactions between words in the context of a sentence. Embeddings and
composition layers are jointly learned against a generic objective that
enhances the vectors with syntactic information from the surrounding context.
Furthermore, each word is associated with a number of senses, the most
plausible of which is selected dynamically during the composition process. We
evaluate the produced vectors qualitatively and quantitatively with positive
results. At the sentence level, the effectiveness of the framework is
demonstrated on the MSRPar task, for which we report results within the
state-of-the-art range.Comment: Accepted for presentation at EMNLP 201
Resolving Lexical Ambiguity in Tensor Regression Models of Meaning
This paper provides a method for improving tensor-based compositional
distributional models of meaning by the addition of an explicit disambiguation
step prior to composition. In contrast with previous research where this
hypothesis has been successfully tested against relatively simple compositional
models, in our work we use a robust model trained with linear regression. The
results we get in two experiments show the superiority of the prior
disambiguation method and suggest that the effectiveness of this approach is
model-independent
Compositional Distributional Semantics with Compact Closed Categories and Frobenius Algebras
This thesis contributes to ongoing research related to the categorical
compositional model for natural language of Coecke, Sadrzadeh and Clark in
three ways: Firstly, I propose a concrete instantiation of the abstract
framework based on Frobenius algebras (joint work with Sadrzadeh). The theory
improves shortcomings of previous proposals, extends the coverage of the
language, and is supported by experimental work that improves existing results.
The proposed framework describes a new class of compositional models that find
intuitive interpretations for a number of linguistic phenomena. Secondly, I
propose and evaluate in practice a new compositional methodology which
explicitly deals with the different levels of lexical ambiguity (joint work
with Pulman). A concrete algorithm is presented, based on the separation of
vector disambiguation from composition in an explicit prior step. Extensive
experimental work shows that the proposed methodology indeed results in more
accurate composite representations for the framework of Coecke et al. in
particular and every other class of compositional models in general. As a last
contribution, I formalize the explicit treatment of lexical ambiguity in the
context of the categorical framework by resorting to categorical quantum
mechanics (joint work with Coecke). In the proposed extension, the concept of a
distributional vector is replaced with that of a density matrix, which
compactly represents a probability distribution over the potential different
meanings of the specific word. Composition takes the form of quantum
measurements, leading to interesting analogies between quantum physics and
linguistics.Comment: Ph.D. Dissertation, University of Oxfor
Investigating the Role of Prior Disambiguation in Deep-learning Compositional Models of Meaning
This paper aims to explore the effect of prior disambiguation on neural
network- based compositional models, with the hope that better semantic
representations for text compounds can be produced. We disambiguate the input
word vectors before they are fed into a compositional deep net. A series of
evaluations shows the positive effect of prior disambiguation for such deep
models.Comment: NIPS 201
Recommended from our members
Unseen Word Representation by Aligning Heterogeneous Lexical Semantic Spaces.
Word embedding techniques heavily rely on the abundance of training data for individual words. Given the Zipfian distribution of words in natural language texts, a large number of words do not usually appear frequently or at all in the training data. In this paper we put forward a technique that exploits the knowledge encoded in lexical resources, such as WordNet, to induce embeddings for unseen words. Our approach adapts graph embedding and cross-lingual vector space transformation techniques in order to merge lexical knowledge encoded in ontologies with that derived from corpus statistics. We show that the approach can provide consistent performance improvements across multiple evaluation benchmarks: in-vitro, on multiple rare word similarity datasets, and in- vivo, in two downstream text classification tasks.MR
Sentence entailment in compositional distributional semantics
Distributional semantic models provide vector representations for words by
gathering co-occurrence frequencies from corpora of text. Compositional
distributional models extend these from words to phrases and sentences. In
categorical compositional distributional semantics, phrase and sentence
representations are functions of their grammatical structure and
representations of the words therein. In this setting, grammatical structures
are formalised by morphisms of a compact closed category and meanings of words
are formalised by objects of the same category. These can be instantiated in
the form of vectors or density matrices. This paper concerns the applications
of this model to phrase and sentence level entailment. We argue that
entropy-based distances of vectors and density matrices provide a good
candidate to measure word-level entailment, show the advantage of density
matrices over vectors for word level entailments, and prove that these
distances extend compositionally from words to phrases and sentences. We
exemplify our theoretical constructions on real data and a toy entailment
dataset and provide preliminary experimental evidence.Comment: 8 pages, 1 figure, 2 tables, short version presented in the
International Symposium on Artificial Intelligence and Mathematics (ISAIM),
201